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Capture-recapture data are often collected when abundance esti¬ 
mation is of interest. In this manuscript we focus on abundance esti¬ 
mation of closed populations. In the presence of unobserved individ¬ 
ual heterogeneity, specihed on a continuous scale for the capture prob¬ 
abilities, the likelihood is not generally available in closed form, but 
expressible only as an analytically intractable integral. Model-fitting 
algorithms to estimate abundance most notably include a numerical 
approximation for the likelihood or use of a Bayesian data augmen¬ 
tation technique considering the complete data likelihood. We con¬ 
sider a Bayesian hybrid approach, defining a “semi-complete” data 
likelihood, composed of the product of a complete data likelihood 
component for individuals seen at least once within the study and 
a marginal data likelihood component for the individuals not seen 
within the study, approximated using numerical integration. This 
approach combines the advantages of the two different approaches, 
with the semi-complete likelihood component specihed as a single 
integral (over the dimension of the Individual heterogeneity com¬ 
ponent). In addition, the models can be fttted within BUGS/JAGS 
(commonly used for the Bayesian complete data likelihood approach) 
but with signiftcantly improved computational efftciency compared to 
the commonly used super-population data augmentation approaches 
(between about 10 and 77 times more efficient in the two examples 
we consider). The semi-complete likelihood approach is hexible and 
applicable to a range of models, including spatially explicit capture- 
recapture models. The model-htting approach is applied to two dif¬ 
ferent datasets: the hrst relates to snowshoe hares where model Mh is 
applied and the second to gibbons where a spatially explicit capture- 
recapture model is applied. 


1. Introduction. In order to estimate total abundance capture-recapture data are often col¬ 
lected on the population under study. Capture-recapture data collection methods involve partially 
observing the population at a series of capture events (or using a number of different sources), such 
that each individual observed within the study is uniquely identifiable. Assuming that marks are 
unique and cannot be lost, a capture history for each individual observed within the study can 
be constructed, detailing whether the given individual is observed or not at each capture event. 
Statistical models can be constructed and applied to capture-recapture data to estimate the num¬ 
ber of individuals in the population that are not observed. We focus on closed population models, 
where it is assumed that that there are no births/deaths/migrations in the population within 
the study period. Applications include estimating the number of injecting drug users (King et al, 
2014; Overstall et ai, 2014), pages on the world wide web (Fienberg et al, 1999), disease preva¬ 
lence (Manrique-Vallier and Fienberg, 2008) and animal populations (Borchers et al, 2002). We 
focus on statistical models for ecological data where individuals are observed at a series of capture 
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events. For further discussion of ecological (closed) capture-recapture data, and the underlying as¬ 
sumptions, see for example, Borchers et al. (2002), Williams et al. (2002) and McCrea and Morgan 
(2014). 

In general, the likelihood of capture-recapture data can be expressed in multinomial form, where 
the different multinomial cells correspond to each possible capture history and the cell entries to 
the number of individuals with the given capture history. The unknown parameters to be estimated 
in the likelihood function are the capture (or detection) probabilities and the total population size 
(or number of individuals in the population unobserved at any capture event). Otis et al. (1978) 
described three different possible effects on the capture probabilities corresponding to temporal 
(t), behavioural (6) and individual heterogeneity (h) effects. We adopt the standard notation and 
describe the different models by Ma, such that o C {t,b,h}, corresponding to the combination of 
effects in the given model. 

In this paper we focus on models that include individual heterogeneity (i.e. M/j-type mod¬ 
els). Individual heterogeneity is often introduced by specifying the capture probabilities as a fi¬ 
nite or infinite mixture. Finite mixture models lead to an explicit likelihood expression which 
can be maximised numerically to obtain the maximum likelihood estimates (MLEs) of the pa¬ 
rameters of interest (Pledger, 2000). Infinite mixture models specify the individual heterogene¬ 
ity as a random effects model. For the special case of a Beta-Binomial random effects com¬ 
ponent the likelihood is available in closed form (Dorazio and Royle, 2003; Morgan and Ridout, 
2008). We will consider the more general case, with an arbitrary individual heterogeneity com¬ 
ponent leading to an analytically intractable likelihood. Previous approaches to fit such models 
to the data include (i) numeral integration to estimate the marginal (or observed) data likeli¬ 
hood (Coull and Agresti, 1999; Borchers and Efford, 2008; Gimenez and Choquet, 2010); and (ii) 
Bayesian data augmentation techniques, using a complete data likelihood approach (corresponding 
to the joint probability density function of the capture histories and individual effects), integrating 
out the individual heterogeneity component within a Markov chain Monte Carlo-type (MCMC) 
algorithm (Durban and Elston, 2005; Royle et al., 2007, 2009; King and Brooks, 2008; King et al., 
2009; Royle and Dorazio, 2012). We combine these two approaches defining a semi-complete data 
likelihood constructed as the product of a complete data likelihood component for the individ¬ 
uals seen at least once in the study and a marginal data likelihood component for the unseen 
individuals. This combines the advantages of each of the individual approaches. We note that sim¬ 
ilar approaches have been previously proposed for specific applications, using bespoke computer 
codes. Most notably, Fienberg et al. (1999) propose a conditional MCMC algorithm for Rasch-type 
models, employing a block update of the total population size and individual heterogeneity terms; 
while Bonner and Schofield (2014) consider a similar Monte Carlo in MCMC approach applied to 
individual covariate models. We describe how the latter approach is a special case of our gen¬ 
eral semi-complete data likelihood approach in Section 3.3. Finally, we demonstrate how individual 
heterogeneity models can be efficiently fitted using BUGS/JAGS with general prior structures spec¬ 
ified on all the model parameters (including the total population size) and provide the associated 
computer codes. 

The paper proceeds as follows. Section 2 describes the general closed population model structure 
and associated notation. Section 3 describes previous model-fitting approaches and the new pro¬ 
posed semi-complete data likelihood approach. The implications of the BUGS/JAGS specification 
for the semi-complete data likelihood and previous Bayesian complete data likelihood approaches 
are compared in Section 4 and the approaches applied and compared for two real examples: the 
first example relates to snowshoe hares where model is applied and the second to a dataset 
of gibbons where a spatially explicit capture-recapture model is applied. Finally in Section 5 we 
conclude with a discussion. 
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2. Individual heterogeneity models. We assume that within the capture-recapture study 
there is a series of T discrete capture occasions. Within the study a total of n distinct individuals 
are observed, with the total (unknown) population size denoted by N. For simplicity we arbitrarily 
number the observed individuals i = 1 ,..., n and the unobserved individuals i = n -|- 1 ,..., 

Let Pit denote the capture probability of individual i = 1,..., at time t = 1,..., T. Further, for 
standard capture-recapture data, Xi = {xu : t = 1,..., T} denotes the capture history of individual 
i = 1,... ,N, such that 

f 0 individual i is unobserved on occasion t; 

* ( 1 individual i is observed on occasion t. 

We consider individual heterogeneity specified such that 


Pit = 9{0,ei), 


for some function g, where 6 denotes the model parameters associated with the capture probabilities 
(which may include, for example, temporal and/or behavioural effect terms, regression coefficients 
for covariate values etc.) and e = {cj : i = 1,...,A^} such that e* S S C M^, corresponding to 
the individual heterogeneity term for individual i = 1,... ,N. Further, we assume an underlying 
model for the individual heterogeneity, such that e is a function of the parameters ij, and that the 
individual heterogeneity terms, e,, are independent of each other conditional on rj. The associated 
joint probability density function of the heterogeneity terms is given by /e(e| A^, 77 ) = 
using the conditional independence assumption (and dropping the dependence on N for the condi¬ 
tional density function of the individual heterogeneity terms for individual i). Further, to provide 
a general framework for both observed and unobserved individual heterogeneity we additionally 
write e = {£*^^*, 6 ^*®} where denotes the set of observed individual heterogeneity compo¬ 
nents and the set of unobserved individual heterogeneity components. Similarly, we write 
ej = for i = 1,...,A^ with obvious notation. Finally, we assume that the capture 

histories of the individuals are independent of each other given the capture probability model 
parameters, 6, and individual heterogeneity terms, e. 

The marginal data likelihood can be expressed in the form. 
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using the multinomial distributional form of the capture-recapture data (omitting the constant 
multinomial coefficients for simplicity), and conditional independence of the random effect terms. 
The term fc{x, e| A^, 6, rf) corresponds to the complete data likelihood (i.e. the joint probability den¬ 
sity function of the capture histories and individual effects); fx{x\N,d, e) the conditional likelihood 
of the capture histories (where the conditioning includes the individual heterogeneity terms); and 
/e(e|A^, 77) the joint probability density function of the individual heterogeneity terms. The term 
fx{xi\6, Ei) corresponds to the conditional likelihood of capture history for individual z = 1 ,..., A^; 
and /e(ej| 77 ) the conditional probability density function of the individual heterogeneity component 
for individual i = 1,... ,N (where in each case we drop the dependence on A^). 


Example 1 - Continuous individual covariates. We consider the case with q time-invariant con¬ 
tinuous individual covariates e = {ei,... ,eAr} where G § C Ri denotes the covariate values as¬ 
sociated with individual i = 1,..., N. Since the covariate values are time-invariant, the associated 
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capture probabilities for each individual are also time-invariant, so that pu = Pi for t = 1,... ,T. 
Assuming that the capture probabilities are linearly related to the covariate values via some link 
function, we may specify, 

g~^{Pi) = a + 

so that 6 = {a,/3}. Common choices for g~^ include the logit and probit functions. Additional 
individual/temporal random effects can be included in the capture probabilities, but we omit 
these here for simplicity (see Example 2). Further we specify a parametric model for the covariate 
values, assuming that conditional on the additional covariate parameters rf, the covariate values 
are independent. 

Assuming that for each individual observed within the study the set of individual covariate 
values is recorded, we have that = {e^ : i = 1,..., n} and = {e^ : i = , N}. More 

generally, the covariate values may not be recorded for all observed individuals. For example, the 
observation process may include sightings recorded from a distance (rather than physical captures) 
so that the covariate may not be able to be obtained if a physical capture is necessary (for example if 
the covariate corresponds to wingspan). In this case the set of unobserved individual heterogeneity 
terms is extended to include the unknown covariate values for observed individuals. 

The complete data likelihood is of the form, 

Ux,e\N,e,rj) oc 

AT] 

where pi is of the above form and yi = Ylt=i (denoting the total number of times individual i is 
observed). The first term of the complete data likelihood corresponds to the conditional likelihood 
(conditional on the individual covariate terms) and the second term to the individual covariate 
component. 

The marginal data likelihood integrates out the unobserved covariate values For notational 
simplicity we provide the marginal data likelihood for the special case where all covariate values 
are known for individuals observed within the study (i.e. = {e* : i = 1 ,... ,re} and = 

{ei : i = n + I ,..., A^}): 
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where g~^{pQ) = a + /3^eo- The extension to the case where observed individuals may also have 
unknown covariate values is immediate. 

We note, in general, the model can be extended to include time-varying individual covariates, us¬ 
ing the time and individual dependent capture probability, pu- This typically substantially increases 
the number of unobserved covariate values, since if an individual is not observed, the corresponding 
covariate value is necessarily also unknown. However, for closed populations, to satisfy the condi¬ 
tion that the population is closed the study period is generally short in duration so that changes 
in time-varying individual covariate values is likely to be limited. 


Example 2 - M^-type models. For M/^-type models the individual heterogeneity corresponds to 
an unobserved individual random effect component (so that = 0 and = e). For example. 
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for model we may set 9 = {a} and r] = {a^} such that, 

ei ~ iV(0,cj^), 

for i = where cr^ denotes the individual random effect variance and S = M. For this 

model, the capture probabilities are again independent of time t, so we can write pu = Pi for all 
t = 1,..., T, with 

= Oi + ei, 

for i = 1,... ,N and t = 1,...,T. Common choices for g~^ include the logit and probit functions. 
The extension to incorporate additional time and/or behavioural effects is immediate (i.e. models 
Mth,Mhh and Mfbh', see for example King and Brooks, 2008). 

The complete data likelihood for model M^, can be written in the form. 
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where pi is of the above form and yi = Ylt=i Once again, the first term of the complete data 
likelihood corresponds to the conditional likelihood (conditional on the individual random effect 
terms) and the second term to the individual effect component. 

The marginal data likelihood integrates out the e terms and (dropping the term since no 
individual heterogeneity terms are observed, i.e. = 0) can be efficiently expressed as, 
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where = 'Yld=id{yi = k) and denotes the number of individuals observed k times within the 
study, for /c = 0,...,T (so that uq is unobserved and N = riQ + n) and g~^{pk) = a + Cfc- 

Example 3 - SECR models. For traditional spatially explicit capture-recapture models, S C 
and the individual heterogeneity corresponds to the unobserved activity centre of the individual 
(so that = 0 and = e). The range of possible models is greater for SECR than non-spatial 
capture-recapture as SECR models involve multiple traps or detectors at different locations on each 
occasion and take account of the location(s) of observations within occasions. To this end we define 
Uj = (uji, Uj 2 ) G to be the Cartesian coordinates of trap j, for j = 1,..., J. We consider the 
likelihood for a study with binary detection data within occasion, such that 

_ J 0 individual i is unobserved by detector j on occasion t; 

( 1 individual i is observed by detector j on occasion t. 

We consider the case where individuals can be observed by more than one detector at each occa¬ 
sion and we assume that observations by different detectors within occasions (as well as between 
occasions) are independent. In this context, e* = (eji,ej 2 ) G (i = 1 ,..., A^) denote the Carte¬ 
sian coordinates of the activity centres of the N individuals in S C M^. It is usually assumed that 
these are independently uniformly distributed in § and do not change between occasions, so that 
fe{e\N,T]) = nili ff:{ei\r]) = A ^, where A is the area of S. The probability of individual i being 
observed by detector j at capture occasion t, denoted pijt is assumed to depend on only the dis¬ 
tance of the detector from the activity centre of individual i, so that pijt = g{9, \ \uj — CiU), where 

||wj — ej|| is the vector norm — ^ikY- The half-normal form is a common choice for g. 

For example, assuming that the capture probabilities are time-independent, we may specify. 


2a2 


Pijt = Pij = Po exp 
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with 6 = {pq, a^}. 

The complete data likelihood can be written as 


fc{x,e\N,e,r]) (X 


Nl 
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where ptjt is of the above form. The first term in the product over individuals corresponds to the 
conditional likelihood associated with individual i (conditional on the individual random effect 
terms) and the second term to the corresponding individual effect component. 

The marginal data likelihood integrates out the e, terms and can be expressed as, 
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once more omitting the term = 0. 


2.1. Model fitting. In the presence of individual heterogeneity leading to an analytically in¬ 
tractable marginal data likelihood a range of different approaches have been proposed. These in¬ 
clude a (classical) numerical integration approach, approximating the marginal data likelihood and a 
(Bayesian) data augmentation approach using the complete data likelihood. For the particular appli¬ 
cation to M/j-type models and SECR, see for example Coull and Agresti (1999); Borchers and Efford 
(2008); Gimenez and Choquet (2010) (for a classical numerical integration approach) and Durban and Elston 
(2005); Royle et al. (2007, 2009); King and Brooks (2008); Royle and Dorazio (2012) (for Bayesian 
data augmentation approaches). We briefly describe the approaches in turn. 


2.1.1. Marginal data likelihood. Eor a general individual heterogeneity model, the marginal data 
likelihood may not be available in closed form (exceptions exist where the heterogeneity component 
is described as a finite mixture model or inhnite Beta distribution). In this case, the corresponding 
likelihood is given in equation (2.1) as a product of integrals. Eor computational efficiency, we 
are able to combine like terms in the likelihood corresponding to each unique encounter history 
(corresponding to the combined capture history and observed individual heterogeneity values). 
Notationally, let D denote the set of possible encounter histories; the capture history for G D; 
e^^ the individual heterogeneity terms for encounter history cj G D; the unobserved individual 
heterogeneity terms for encounter history a; G D and the number of individuals with encounter 
history cj. The marginal data likelihood can be expressed as, 
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Thus, this likelihood requires the estimation of a series of integrals each of dimension (at most) 
dim(S), where typically dim(S) is small. Eor example, in the presence of q time invariant continuous 
covariates, dim(S) = q, for model Mh, dim(S) = 1 and for the standard SECR model dim(§) = 
2 (see Examples 1-3 above). The number of integrals in the marginal data likelihood is equal 
to the number of unique observed encounter histories plus one (corresponding to the encounter 
history of not being observed). Each integral can, in general, be approximated using standard 
integration techniques, such as Gauss-Hermite quadrature, grid-based approaches etc. Thus the 
computational efficiency of this approach will be dependent on dim(S) and the number of unique 
encounter histories observed. Eor closed population models, dim(S) is typically very small. This 
(approximate) likelihood can be estimated using standard optimisation techniques to obtain the 
associated MLEs of the model parameters. 
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2.1.2. Complete data likelihood. The Bayesian complete data likelihood approach specifies the 
unobserved individual heterogeneity terms, as auxiliary variables (or additional parameters). 

The joint posterior distribution of the parameters and auxiliary variables is then formed and given 

by, 


7r(fV,0,77,e^®^|a;,e‘^^^) oc fc{x,e\N,e,rj)p{N,e,ri) 

= fx{x\N,e,e)fe{e\N,'q)p{N,e,'q), 

where fc{x, e| A^, 6, rj) denotes the complete data likelihood; fx{x\N, 0, e) the conditional likelihood 
of the observed data (conditional on the full set of individual heterogeneity terms); fe{e\N,T]) the 
individual heterogeneity component; and p{N,0,r]) the prior density specified on N, 6 and rj. The 
posterior density of only the model parameters, 7r{N,6,r]\x,e^^^), is obtained by integrating out 
over the auxiliary variables, However, the integration is analytically intractable so that an 

MCMC approach is typically implemented, whereby we construct a Markov chain with stationary 
distribution equal to the joint posterior distribution, TT{N,0,ri,e^^^\x,e^^^), and subsequently 
estimates of the marginal posterior summary statistics of interest are obtained. 

An additional computational model fitting difficulty arises since e = {ei,..., e^r} and hence e 
is itself a function of the unknown parameter, N. To address this issue King and Brooks (2008) 
describe a reversible jump (RJ) MCMC algorithm for M/i-type models that is able to explore the 
joint posterior distribution, where the number of parameters is able to vary within the constructed 
Markov chain. This involved writing bespoke computer code. Alternatively, Durban and Elston 
(2005); Royle et al. (2007, 2009); Royle and Dorazio (2012) use data augmentation techniques that 
can be fitted in BUGS/JAGS. The underlying idea is to specify a super-population of size M, with 
associated individual random effect terms e, for i = 1,..., M. The encounter histories for individu¬ 
als n -|- 1,..., M correspond to not being observed within the study. Within the MCMC algorithm, 
the random effect term for each individual in this super-population is imputed in addition to a 
binary indicator variable, Zi for i = 1,... ,M, identifying which members of the super-population 
are members of the target population of interest (by definition Zi = 1 for i = l,...,n, i.e. for 
all individuals observed at least once within the study). This binary indicator variable has been 
implemented using two different techniques each with different consequences. Durban and Elston 
(2005) specify the binary variables, such that zi,..., zn = ^ and zjy+i ,..., zm = 0 (i.e. the indica¬ 
tor variables are ordered); whereas Royle et al. (2007, 2009) do not induce any such structure on 
the indicator variables relating to unobserved individuals, setting Zi = 1 for i = 1,... ,n and mod¬ 
elling each indicator variable Zi for z = n -|- 1,..., M. The estimate of N is obtained as the sum of 
non-zero indicator variables, i.e. N = YlfLi la other words Durban and Elston (2005) define the 
indicator variables, conditional on N, whereas Royle et al. (2007, 2009) define N, conditional on 
the indicator variables. For ease of reference we refer to the complete data likelihood data approach 
of Durban and Elston (2005) as CD:DE (complete data: Durban and Elston) and of Royle et al. 
(2007, 2009); Royle and Dorazio (2012) as CD:R (complete data: Royle). 

Several issues arise with regard to these super-population data augmentation approaches. For 
both approaches M needs to be specified and corresponds to an upper bound for the total population 
size. This necessarily leads to a trade-off between the size specified for M and the computational 
speed of the code. The larger the value of M, the greater the computational time due to the 
imputation of the random effect term (and binary indicator variable for CD:R) for each individual in 
the super-population. Too small a value for M will lead to a truncation of the posterior distribution 
and biased inference. In addition, for CD:R, since N is derived as a deterministic function of the 
indicator variables, it has a more limited prior specification (see Section 3.2 for further discussion 
regarding prior specification). Alternatively for the approach of CD:DE, due to the more restricted 
nature of the indicator variable specification, mixing issues can arise. To aid in the efficiency of 
the computational algorithm Durban and Elston (2005) advocate the use of a pseudo-prior for 
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the corresponding random effect terms for individuals not in the population (i.e. for for all 
i = N + 1,..., M). The pseudo-prior is obtained from an initial MCMC run, using the estimated 
posterior distribution for the random effect of an unobserved individual. For further discussion of 
data augmentation techniques (particularly focusing on CD:R), see for example, Link (2013) and 
Schofield and Barker (2014). 

In general, without any prior information, the choice of analysis (classical marginal data likeli¬ 
hood or Bayesian complete data likelihood) may be data dependent. In general, for a given dataset, 
there is a computational trade-off between these different approaches. The marginal data likelihood 
requires the numerical approximation of the integrals over the individual random effects; the com¬ 
plete data likelihood is fast to evaluate but the individual random effects need to be updated 
within the MCMC algorithm (using either RJMCMC or a super-population approach). To avoid 
the use of explicitly approximating multiple integrals or the need to use a super-population or trans- 
dimensional algorithm, we propose a hybrid semi-complete data likelihood approach. This involves 
numerical integration for that part of the likelihood corresponding to unobserved individuals (as 
in the marginal likelihood approach), while for the observed individuals any unobserved individual 
heterogeneity terms are treated as auxiliary variables within a data augmentation approach (as in 
the complete data likelihood approach). In this case, the number of auxiliary variables is known 
so that the dimension of the parameter space is known and fixed. Standard BUGS/JAGS soft¬ 
ware readily accommodates this approach, which involves approximation of only a single integral 
of dimension dim(S). We describe this approach in more detail next. 

3. Semi-complete data likelihood. We propose a semi-complete data likelihood approach, 
combining the complete data likelihood for the individuals that are observed within the study (i.e. 
individuals z = 1 ,..., n), with a marginal data likelihood for the individuals that are not observed 
within the study (i.e. individuals i = n -|- 1,..., N). The semi-complete likelihood is expressed in 
the form, 

fs{x,ei.n\N,0,rj) = fx*{x\N,e,ri,ei.,n)fe{ei:n\N,rj) 

where ei-n = {ei,...,e„}; fx*ix\N,9,r],€i:n) denotes the conditional likelihood of the capture 
histories conditional on the model parameters {N, 6 and r/) and individual heterogeneity terms 
for the observed individuals only (ei:n); and /e(ei:„|A^, zy) the joint probability density function of 
the individual heterogeneity component for the observed individuals. Further, we have the follow¬ 
ing conditional likelihood functions: fx*{xi-.n\N,G, for the capture histories of the observed 
individuals only, conditional on the model parameters and individual heterogeneity terms for the 
observed individuals (dropping the dependence on rj since these are conditionally independent 
given ei:„); fx*{xn-\-i:N\N,0,r]) for the capture histories of the unobserved individuals, condi¬ 
tional on the model parameters; and fx*{xi\9,ri) for the capture history for unobserved individual 
z = n-|-l,...,A^, given the capture probability and individual heterogeneity model parameters (in 
the latter two cases dropping the conditioning on ei-n)- Then, letting Xa-.b = {xa, ■ ■ ■ ,*&}, we can 
express the conditional likelihood in the form: 

fx*{x\N,9,rt,ei.,n) = /®* (a;i:n|A^, eun)/®* (®n+l;Af|A^, j?) 

jY! UL 

(M- M ^ n 

i=l i=n+l 


(3.1) 
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where 1 — p* denotes the probability of not being observed within the study (or conversely p* 
denotes the probability of being seen at least once within the study) such that, 

(3.2) l-p*=f /a;(a; = O|0,ea,)/e(ea,|?7)dea,, 

and = 0 denotes the encounter history of a single individual who is unobserved within the study; 
/a,(cj = 0\6,ri,ei^) the conditional likelihood function associated with an individual not observed 
within the study and f^{ecj\6,ri) the probability density function of the associated individual het¬ 
erogeneity terms for an individual not observed within the study. The product in equation (3.1) 
corresponds to the likelihood of the encounter histories, for an individual observed at least once 
within the study, conditional on the individual heterogeneity terms. The latter term corresponds 
to the contribution to the likelihood relating to the unobserved individuals. 

An alternative (equivalent) model specification is given by 

1 ^ 

(3.3) fx*{x\N,e,ri,ei.,n) oc fx{xi\e, e^) x (p*)’"(l - p*)^~'^, 

where p* is as above. The first term corresponds to the conditional likelihood of the observed 
capture histories, given that each of these individuals has been observed within the study and 
the corresponding individual heterogeneity terms. The second term corresponds to the Binomial 
probability of observing the number of individuals in the study, given the total population size. 

We note that the semi-complete likelihood reduces to a single integral (over the dimension of the 
individual heterogeneity terms, i.e. dim(S)). This is in contrast to the marginal data likelihood which 
is a product of integrals (see Section 2.1.1), where the number of additional integrals corresponds 
to the number of unique encounter histories observed. 

3.1. Bayesian implementation. Notationally, we let and denote the set of observed 
and unobserved individual heterogeneity terms for the observed individuals, respectively. The joint 
posterior distribution for the model parameters and unobserved individual heterogeneity terms for 
the observed individuals is given by, 

7r(Af,0,J7,e^f“|a;,e^^*) cx fs{x,ei.,n\N,9,r])p{N,e,ri) 

(3.4) = fx*{x\N,e,r],ei.,n)fe{ei:n\N,r])p{N,e,r]), 

where fs{x, ei:„| A", 0, r]) is the semi-complete data likelihood. Note that, as is typically the case, we 
assume that the priors specified for the total population size and model parameters are independent, 
so that p{N, 6, T]) = p{N)p{0)p{r]). 

We use a standard Bayesian data augmentation approach for obtaining inference on the pos¬ 
terior distribution of interest, TT{N,6,r]\x,ef.^). The number of auxiliary variables needed within 
this Bayesian data angmentation approach, using the semi-complete likelihood, is fixed and sim¬ 
ply equal to (i.e. the auxiliary variables correspond to the number of unobserved individual 

heterogeneity terms of observed individuals). This is in contrast to the use of the joint posterior 
distribution of the model parameters and all nnobserved individual heterogeneity terms, 
given in equation (2.2)), since , en+i-.n} where A is a parameter to be estimated. A 

number of different approaches have been proposed to fit individual heterogeneity models. These in¬ 
clude trans-dimensional algorithms using reversible jump MCMC (King and Brooks, 2008), a joint 
posterior conditional MCMC algorithm (Fienberg et al, 1999) for Rasch-type (Mf^) models and 
super-population data augmentation techniques. The first two approaches require bespoke code, 
while the super-population data angmentation approaches can be implemented within BUGS/JAGS 
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(Durban and Elston, 2005; Royle et ai, 2007; Royle and Dorazio, 2012) but require the specifica¬ 
tion of an upper bound M and imputation of the (M — n) individual random effect terms en+i-.M 
(and dependent on the exact coding approach, M binary indicator variables). 

Using the semi-complete data likelihood and corresponding posterior distribution given in equa¬ 
tion (3.4), including only the heterogeneity terms for the observed individuals, permits standard 
(non-trans-dimensional) MCMC updating algorithms (such as the Metropolis-Hastings algorithm) 
to obtain inference on the parameters 6, rj and N. However, the semi-complete data likelihood 
removes the necessity to impute the terms e-n+i-M and the need to specify an upper bound on the 
total population size, in general (see Section 3.2). Consequently, the models can be immediately 
htted within BUGS/JAGS packages (see Section 4 for further discussion and the Web Appendix 
for example JAGS code), with an explicit prior distribution specified on N. The trade-off of using 
the posterior distribution with semi-complete data likelihood, given in equation (3.4), is that the 
integral in equation (3.2) needs to be explicitly (numerically) estimated. However, in general this 
will be of very low dimension (often only one or two dimensions) for closed population models and 
so computationally fast and able to be accurately estimated (for example using Gaussian quadra¬ 
ture). We compare the complete data likelihood and semi-complete data likelihood approaches in 
Section 4 using JAGS for two different applications. 


3.2. Prior specification for N. We briefly discuss possible prior distributions that are com¬ 
monly specihed on N and the corresponding Bayesian (and BUGS/JAGS) implementation. For 
the Bayesian data augmentation approach of Royle et al. (2007) (approach GD:R), the prior on 
N is only dehned implicitly, given the prior specihcation on the indicator function relating to the 
probability that an individual in the super-population is a member of the population of interest, de¬ 
noted if. The most common form of induced prior on N is the Uniform prior. However, Link (2013) 
showed that the uninformative prior if ~ C/[0,1] which induces the discrete uniform prior on N can 
lead to undesirable properties. Link (2013) therefore recommended the prior if: ~ Heta(0.001,1) 
which is easy implemented in BUGS/JAGS and induces an approximate Jeffreys’ prior on N. More 
generally, specifying the prior ~ Beta{a,b) induces the prior N ~ Beta — Binomial{M,a,b), 
where M is the super-population upper bound. This is a fairly flexible prior structure, but the 
computational limitations with regard to specifying a suitable value of M remain. 

For the complete data likelihood approach of Durban and Elston (2005) (approach GD:DE) and 
the semi-complete data likelihood approach an explicit prior is directly specified on N. Thus, any 
arbitrary distribution (specified on the set of non-negative integers) can be specihed on the total 
population size. For example, Jeffreys’ prior is a commonly used uninformative prior, given by 
p{N) oc N~^ (see for example, Madigan and York (1997); King and Brooks (2008)). We note that 
specifying Jeffreys’ prior, and using the semi-complete data likelihood expression given in equation 
(3.3) leads to a standard posterior conditional distribution for Y, i.e., 


{N — n)\x, 0,r] ^ Neg — Bin{n,p*), 


for p* given in equation (3.1)^. Consequently, for Jeffreys’ prior, the Gibbs sampler can be imple¬ 
mented for updating N within the MCMC algorithm. In general, if the prior or posterior conditional 
distribution for N is of (closed or) standard form this also simplihes the specihcation of the model in 
BUGS/JAGS, since this prior or posterior conditional distribution can be explicitly specihed in the 
model component (see the Web Appendix for sample JAGS code for the above Negative-Binomial 
posterior conditional distribution case). See also Fienberg et al. (1999) for further discussion. 


^We use the form of the Negative Binomial distribution such that for X 
function is given by, 

\ n 1 )! nf., \x 

xKn-iy. ^ ■ 

This is the functional form of the distribution used with BUGS/JAGS. 


Neg — Bin(n, q) the probability mass 
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Alternative prior distributions include p{N) oc N~‘^ for some positive constant c, proposed by 
Fienberg et al. (1999). For c > 1 the tail of the distribution for N decays faster than for Jeffreys’ 
prior; while c < 1 leads to a heavier tailed distribution. Alternatively, for an informative prior dis¬ 
tribution for N, a Poisson or Poisson-Gamma (equivalently a Negative-Binomial) prior distribution 
is often specihed on N (King and Brooks, 2001). It can also be noted that specifying N Po{X) 
and A ~ F(J, J) for small 6 provides another approximate Jeffreys’ prior for N. These alternative 
prior distributions are able to be implemented within BUGS/JAGS (typically using the zeros or 
ones trick, Lunn et al. (2013) - see the Web Appendix for associated sample JAGS code). 

3.3. Special case. We note that the approach presented by Bonner and Schofield (2014) is a 
special case of the semi-complete data likelihood approach applied to a covariate model. In par¬ 
ticular, Bonner and Schofield (2014) consider a time invariant individual covariate model given in 
Example 1 of Section 2. Using the terminology presented above, so that the notation differs to 
that given in Bonner and Schofield (2014), they describe the particular case where = ei-n and 
^Mis _ i ]2 other words, the individual heterogeneity terms are known for individuals ob¬ 

served within the study (though it is implied in their discussion that the approach is more generally 
applicable). The posterior distribution is then formed analogous to Equation (3.4). The probability 
of not being observed within the study, given in Equation (3.2) is estimated using Monte Garlo 
integration. 

4. Examples. We consider two real examples, relating to model Mh and SECR, described 
in Section 2. We note that as with all performance metrics for comparing the efficiency of differ¬ 
ent model-fitting approaches these are dependent on numerous factors, such as the programming 
language, specific application, data, model specification (including the pseudo-priors specified for 
the super-population approach), initial starting values and machine used. In order to be able to 
draw sensible comparisons for each example we present results obtained from same machine and 
language using the JAGS codes provided in Web Appendix A. 

4.1. Model Mh - snowshoe hares. To demonstrate our proposed semi-complete data likelihood 
approaches for model Mh, we revisit the snowshoe hare data originally examined in the seminal pa¬ 
per of Otis et al. (1978) and subsequently analyzed by many others (for example Coull and Agresti, 
1999; Dorazio and Royle, 2003; Royle et al., 2007; Link, 2013). Over T = 6 days of trapping, n = 68 
hares were captured with observed frequencies n = (25,22,13,5,1,2)' where nt = I{yi = t) 
and Hi = Ylj'=i ^ij for t = 1,..., T. We assume logit(pji) = a + Ci and e* ~ N{0, for i = 1,..., N 
and t = 1,... ,T, with 0 = {a} and rj = {a^}. 

We fit the semi-complete data likelihood and complete data likelihood Bayesian super-population 
(GD:R and GD:DE) approaches in R (R Gore Team, 2014) using the rjags package (Plummer, 
2013, see Web Appendix A for JAGS code). For each analysis we specify the priors, a ~ A^(0,100) 
and c7^ ~ F“^(0.01,0.01). We specify Jeffreys’ prior for N, for the semi-complete data likelihood 
and GD:DE. For ease of comparison with CD:R we set ■0 ~ Reta(0.001,1), which induces an 
approximate (truncated) Jeffreys’ prior for N on 1,..., M (Link, 2013). We note that we consider 
two JAGS specifications for the semi-complete data likelihood. The first approach (SGDl) uses the 
Jeffreys’ prior specification for N explicitly in the model component of the code. However, since 
Jeffreys’ prior is improper we need to specify an upper bound for N, which we again denote by 
M (essentially this is a truncated Jeffreys’ prior at M). The second approach (SCD2) specifies the 
(predictive) posterior conditional distribution for N — n, which is of Negative-Binomial form (see 
Section 3.2). 

Following Link (2013), we specify an upper bound of M = 1000 for the maximum total population 
size for the complete data likelihood super-population approaches and the first semi-complete data 
likelihood approach (SGDl) in JAGS. For the semi-complete data likelihood approach, the integral 
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Table 1 

Posterior summaries for snowshoe hare abundance (N) under model Mh using the semi-complete data likelihood 
(SCD) approach, CD:R and CD:DE. The semi-complete data likelihood approaches correspond to specifying the 
prior for N (SCDl) and the posterior conditional distribution for N — n (SCD2) in the model component of the 
JAGS code. For SCDl, CD:R and CD:DE, we specify an upper limit of M = 1000. Effective sample size (ESS) and 
effective sample size per second (ESS/s) are included for each approach. A total of 30 million iterations are used in 

each case with the realisations thinned by 10. 


method 

mean 

median 

SD 

95% Cl 

ESS 

ESS/s 

SCDl 

100.3 

93 

32.8 

(74, 171) 

168347 

7.67 

SCD2 

101.1 

93 

74.9 

(74, 173) 

167680 

7.74 

CD:R 

100.6 

93 

32.7 

(74, 171) 

13080 

0.10 

CD;DE 

101.3 

93 

36.2 

(74, 178) 

9626 

0.03 


in Equation (3.2) is evaluated using Gauss-Hermite quadrature: 


(4.1) 




Wi 


^ [l + exp [\/2avj + a)] 


T ’ 


where Wj and Vj are the weights and nodes corresponding to q quadrature points (sensu McClintock et ai, 
2009). The degree of accuracy of this approximation increases with q, and larger q is required for 
larger a. For our analyses, we specify q = 100. 

For each approach, we ran three chains of 10 million iterations (after initial pilot tuning and 
burn-in) from overdispersed starting values, thinning the realisations by 10 for memory storage 
purposes. Chain convergence was assessed based on visual inspection and Brooks-Gehnan-Rubin 
diagnostics (no lack of convergence was identified). On a computer running 64-bit Windows 7 
(3.4GHz Intel Core i7 processor, 16Gb RAM), the analyses required about 6.1 hrs for the first 
semi-complete data likelihood (prior distribution for N specified) approach, 6.0 hrs for the second 
semi-complete data likelihood (posterior conditional distribution for N — n specified), 35.1 hrs for 
CD:R and 83.3 hours for CD:DE. We note that the run times should be interpreted comparatively, 
as they will in general differ across different computers as a result of different processors, operating 
systems etc. The marginal posterior summaries are provided in Table 1, coupled with the effective 
sample sizes (per second) for each approach. 

Although setting M = 1000 may appear conservative, this did appear to influence the skewness 
of the right tail of the marginal posterior distribution for N relative to the (unbounded) posterior 
distribution for N when using the second semi-complete data likelihood approach (SCD2). We 
therefore reran the first semi-complete data likelihood (SCDl) analysis with M = 10000 leading 
to posterior summary results more similar to the second complete data likelihood approach {N 
posterior mean = 100.9, median = 93, SD = 56.1, 95% credible interval (Cl) = (74, 172)), but with 
noticeably reduced effective sample size (ESS = 74928) and increased computation time (ESS/s = 
2.81). Nevertheless, specifying larger M for the first semi-complete data likelihood approach comes 
at considerably less computational cost compared to the super-population complete data likelihood 
approaches (CD:R and CD:DE). Avoidance of the need to specify M when using BUGS/JAGS 
remains an advantage of the general semi-complete data likelihood approach (this is true even 
when using Jeffreys’ prior on N by specifying the posterior conditional distribution for N — n m 
the model component of the code). 

For approach SCDl, using an explicit Negative-Binomial or Beta-Binomial approximation to 
Jeffreys’ prior (code provided in Web Appendix A) unsurprisingly lead to similar results in terms 
of ESS and ESS/s as for the use of the explicit (truncated) Jeffreys’ prior. However, within the 
model specification code, using the distributions’ hierarchical form where an auxiliary variable is 
introduced for the Poisson mean or Binomial probability and imputed within the MCMC algorithm 
lead to lower ESS and ESS/s as a result of poorer mixing due to posterior correlation between 
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parameters. We do not consider these prior specifications further. 

Finally, we note that q = 100 appeared to be sufficient in the Gauss-Hermite quadrature approach 
for these analyses, but in general proper specification of q will be case dependent. For example, 
using our estimated posterior median a = —1.2 and the 99.9% quantile cr = 3.3, Equation (4.1) 
with q = 100 is accurate to a precision of five decimal places. However, for a = W, q = 100 it is only 
accurate to two decimal places. Care must therefore be taken when specifying q using the semi- 
complete data likelihood approach in JAGS. If computation speed is of little concern Equation (3.2) 
could alternatively be approximated in OpenBUGS using the inbuilt integral function, which also 
has an inbuilt default value for q. 

4.2. Model SECR - gibbons. To illustrate the proposed semi-complete data likelihood approach 
in the context of SECR models we use acoustic survey data from a population of northern yellow¬ 
cheeked gibbon from northeastern Cambodia. These data were collected from 13 replicate survey 
locations, each consisting of a 3 by 1 linear array of listening posts spaced 0.5km apart. Each 
listening post was manned by a single human observer who recorded the timing of calls at each and 
an estimated compass bearing to each detected gibbon group. Recaptured groups were determined 
using the estimated bearings and detection times. Over T = 1 survey days a total of n = 77 gibbon 
groups were detected across the 13 arrays. We specify the half-normal function for g of the form, 

P^,t = exp -J . 

For each analysis we specify the prior a ~ U[0, 10] and assume that the home range centres are 
uniformly distributed over the given area, i.e. f^{ei\r]) = ^ where A is the area of S for each 
i = 1,..., iV (in this case A = 546km^). Thus we set ^|J ~ Beta{0.001, 1) for the super-population 
approach GD:R and Jeffreys’ prior for N for the complete data likelihood (GD:DE) semi-complete 
likelihood approaches. 

As in Section 4.1 we fit both forms of the semi-complete data likelihood (Equations (3.1) and 
(3.3)) and the super-population complete data likelihood Bayesian approaches CD:R and GD:DE 
using the rjags package (Plummer, 2013, see Web Appendix B for sample JAGS code). For the 
complete data likelihood approaches and first semi-complete data likelihood (specifying Jeffreys’ 
prior on N within the model component of the JAGS code) we specify an upper bound of M = 1000 
for the discrete support of N. For both semi-complete likelihoods the integral in Equation (3.2) was 
approximated by a summation over a rectangular grid of 4200 points. Note that a suitable choice 
of grid will be case dependent, with increases in accuracy resulting from greater spatial extents and 
decreased distances between neighbouring grid points, but at the expense of computational time. 
An exploratory analysis suggested that the grid used was relatively conservative, achieving good 
numerical accuracy. 

To compare the performance of the different approaches, each MCMG algorithm is run for 500,000 
iterations, following a burn-in period of 10,000 iterations (no lack of convergence was identified for 
simulations of this length). On a computer running Windows Server 2008 R2 Enterprise (3.1GHz 
Intel Xeon GPU E5-2687, 256Gb RAM), the analyses required about 46.6 minutes for the first 
semi-complete data likelihood (SCDl; specifying (truncated) Jeffreys’ prior on N in the model 
component) approach, 42.3 minutes for the second semi-complete data likelihood (SGD2; specifying 
the posterior conditional distribution for N — n), 2.5 hours for GD:R and 6.8 hours for CD:DE. As 
for the snowshoe hare example, marginal posterior summaries were similar for all parameters using 
all approaches, but the semi-complete data likelihood approaches required far less computation 
time and yielded greater effective sample sizes than the data-augmented complete data likelihood 
approaches (Table 2). 
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Table 2 

Posterior summaries for gibbon group abundance (N) under the SECR models using the semi-complete data 
likelihood (SCD) approach, CD:R and CD:DE. The semi-complete data likelihood approaches correspond to 
specifying the prior for N (SCDl) and the posterior conditional distribution for N — n (SCD2) in the model 
component of the JAGS code. Eor SCDl, CD:R and CD:DE, we specify an upper limit of M = 1000. Effective 
sample size (ESS) and effective sample size per second (ESS/s) are included for each approach. A total of 500,000 

iterations are used in each case. 


model 

mean 

median 

SD 

95% Cl 

ESS 

ESS/s 

SCDl 

357.1 

328 

176.2 

(119, 766) 

2763 

1.01 

SCD2 

357.7 

327 

178.4 

(120, 775) 

3872 

1.56 

CD:R 

355.3 

326 

176.9 

(118, 768) 

865 

0.09 

CD;DE 

362.7 

338 

173.2 

(122, 765) 

622 

0.03 


5. Discussion. For closed population models, the semi-complete data likelihood specifies the 
joint probability density function of the model parameters and associated unobserved individual 
heterogeneity terms for only those individuals observed, conditional on the observed capture histo¬ 
ries and observed individual heterogeneity components. This likelihood is specified as an integral of 
the individual heterogeneity component for the unobserved individuals. The integral is analytically 
intractable but of dimension equal to the dimension of the individual heterogeneity component of 
the model, and hence typically small. This permits the the use of standard (efficient) numerical ap¬ 
proximation techniques to estimate the integral (for example, in OpenBUGS, the inbuilt integral 
function can be used to conduct one dimensional integration; with similar inbuilt functions in R for 
one or multi-dimensional integrals). The semi-complete data likelihood approach can be applied to 
a range of different individual heterogeneity models. 

Using this semi-complete data likelihood within a Bayesian analysis of closed capture-recapture 
data in the presence of individual heterogeneity, removes the need for trans-dimensional algo¬ 
rithms to explore the posterior distribution of the parameters due to the “unknown number of 
parameters” problem. Consequently, the models can be fitted efficiently in standard software, such 
as BUGS/JAGS without using a super-population approach. The semi-complete data likelihood 
approach is significantly more efficient than the previous super-population approaches, as demon¬ 
strated in Section 4, where the improvement for the examples that we considered using the codes 
provided in the Web Appendix was up to two orders of magnitude. The improvement is in terms 
of both computational time and effective sample sizes (as a result of improved mixing within the 
MCMC algorithm). The efficiency of the super-population approaches is heavily dependent on the 
upper limit specihed for the super-population, M. This makes the Bayesian approach feasible for 
fitting to a significantly wider range of data, particularly for spatially explicit capture-recapture, 
where the use of a Bayesian data augmentation technique can be particularly inefficient. In general, 
the ESS and ESS/s for the different approaches is dependent on numerous factors including the 
exact form of the model specification, the pseudo-priors specified in the super-population approach, 
initial starting values and computer on which the simulations are being run. 

This semi-complete data approach has been developed for closed population models in the pres¬ 
ence of individual heterogeneity. As discussed in Example 1 of Section 2 the inclusion of additional 
observable individual level covariates is immediate and can be seen to be a generalisation of the 
Monte Carlo in MCMC approach proposed by Bonner and Schoheld (2014) (see Section 3.3). The 
individual heterogeneity terms correspond to the covariate values and are typically known when 
individuals are observed, though this need not be the case (missing covariate values for individ¬ 
uals observed within the study can again be treated as auxiliary variables within the complete 
data likelihood component). In the presence of time-varying continuous individual covariates the 
increase in dimension of the necessary integral in the associated marginal data likelihood can 
be reduced by efficiently approximating the underlying state process as a hidden Markov model 
(Langrock and King, 2013). The approach can also be immediately applied to other forms of data. 
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For example, these include stopover models permitting arrivals to, and departures from, the study 
population (Pledger et al, 2009) and conventional distance sampling (Buckland et al, 2001). For 
the latter case the capture history is a univariate binary term (1 if an individual is observed and 0 
if unobserved), the individual heterogeneity component is the perpendicular distance of the indi¬ 
vidual from the line/point transect (known for observed individuals), assumed to have a uniform 
distribution (for line transects) or triangular distribution (for point transects), see for example. 
Equation (7.10) on page 141 of Borchers et al. (2002). Further work lies in identifying and develop¬ 
ing similar approaches for different forms of data. In addition, for more general Bayesian analyses, 
highly correlated parameters often leads to inefficient MCMC algorithm, due to poor mixing. To 
address this issue, a reparameterisation may often be used and/or block-updates implemented. An 
alternative approach, motivated by this semi-complete data approach, would be to identify and 
integrate out (using a numerical approximation) the highly correlated parameters. This is an area 
of current research. 
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